13 research outputs found

    Bug or Not? Bug Report Classification Using N-Gram IDF

    Get PDF
    Previous studies have found that a significant number of bug reports are misclassified between bugs and non-bugs, and that manually classifying bug reports is a time-consuming task. To address this problem, we propose a bug reports classification model with N-gram IDF, a theoretical extension of Inverse Document Frequency (IDF) for handling words and phrases of any length. N-gram IDF enables us to extract key terms of any length from texts, these key terms can be used as the features to classify bug reports. We build classification models with logistic regression and random forest using features from N-gram IDF and topic modeling, which is widely used in various software engineering tasks. With a publicly available dataset, our results show that our N-gram IDF-based models have a superior performance than the topic-based models on all of the evaluated cases. Our models show promising results and have a potential to be extended to other software engineering tasks.Comment: 5 pages, ICSME 201

    ジッケンテキ ヒョウカ ニ ヨル アナロジー ベース ソフトウェア カイハツ コウスウ ヨソク ノ サイテキカ

    No full text
    博第1330号甲第1330号博士(工学)奈良先端科学技術大学院大

    OSS リポジトリ マイニング ノ タメ ノ パッチ ノ ダンカイテキ ショウニン ケンチ ギジュツ

    No full text
    修士(Master)工学(Engineering)奈良先端科学技術大学院大学修第5666

    LSA-X: Exploiting Productivity Factors in Linear Size Adaptation for Analogy-Based Software Effort Estimation

    Get PDF
    Analogy-based software effort estimation has gained a considerable amount of attention in current research and practice. Its excellent estimation accuracy relies on its solution adaptation stage, where an effort estimate is produced from similar past projects. This study proposes a solution adaptation technique named LSA-X that introduces an approach to exploit the potential of productivity factors, i.e., project variables with a high correlation with software productivity, in the solution adaptation stage. The LSA-X technique tailors the exploitation of the productivity factors with a procedure based on the Linear Size Adaptation (LSA) technique. The results, based on 19 datasets show that in circumstances where a dataset exhibits a high correlation coefficient between productivity and a related factor (r?0.30), the proposed LSA-X technique statistically outperformed (95% confidence) the other 8 commonly used techniques compared in this study. In other circumstances, our results suggest using any linear adaptation technique based on software size to compensate for the limitations of the LSA-X technique

    Bug or Not? Bug Report Classification Using N-Gram IDF

    No full text
    corecore